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Abstract — Recent research has demonstrated significant achievable performance gains by 
exploiting circularity /non-circularity or propeness/improperness of complex- valued signals. 
In this paper, we investigate the influence of these properties on important information the- 
oretic quantities such as entropy, divergence, and capacity. We prove two maximum entropy 
theorems that strengthen previously known results. The proof of the former theorem is based 
on the so-called circular analog of a given complex-valued random vector. Its introduction 
is supported by a characterization theorem that employs a minimum Kullback-Leibler di- 
vergence criterion. In the proof of latter theorem, on the other hand, results about the 
second-order structure of complex-valued random vectors are exploited. Furthermore, we 
address the capacity of multiple-input multiple-output (MIMO) channels. Regardless of the 
specific distribution of the channel parameters (noise vector and channel matrix, if modeled 
as random), we show that the capacity-achieving input vector is circular for a broad range of 
MIMO channels (including coherent and noncoherent scenarios). Finally, we investigate the 
situation of an improper and Gaussian distributed noise vector. We compute both capacity 
and capacity-achieving input vector and show that improperness increases capacity, provided 
that the complementary covariance matrix is exploited. Otherwise, a capacity loss occurs, 
for which we derive an explicit expression. 

Index terms — Differential entropy, Kullback-Leibler divergence, mutual information, capac- 
ity, circular /non-circular, proper /improper, circular analog, multiple-input multiple-output 
(MIMO). 
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1 Introduction 



Complex- valued signals are central in many scientific fields including communications, array process- 
ing, acoustics and optics, oceanography and geophysics, machine learning, and biomedicine. In recent 
research — for an comprehensive overview see [3] — it has been shown that exploiting circularity/properness 
of complex- valued signals or lack of it (non-circularity/improperness) is able to significantly enhance the 
performance of the applied signal processing techniques. More specifically, for the field of communi- 
cations, it has been observed that important digital modulation schemes including binary phase shift 
keying (BPSK), pulse amplitude modulation (PAM), Gaussian minimum shift keying (GMSK), off- 
set quaternary phase shift keying (OQPSK), and baseband (but not passband) orthogonal frequency 
division multiplexing (OFDM), which is commonly called discrete multitone (DMT), (potentially) pro- 
duce non-circular /improper complex baseband signals, see e.g., [4-9]. Non-circular /improper baseband 
communication signals can also arise due to imbalance between their in-phase and quadrature (I/Q) 
components, and several techniques for compensating for I/Q imbalance have been proposed [10-12]. 

Information theory, on the other hand, addresses fundamental performance limits of communica- 
tion systems and also has large impact on many other scientific areas, where stochastic models are 
used. Therefore, results about the most relevant information theoretic concepts, such as entropy, diver- 
gence, and capacity, are of special interest. Clearly, information theory has the potential to study the 
performance limits of signal processing algorithms and communications systems that exploit circular- 
ity/properness or non-circularity/improperness. However, results in this direction are limited and have 
to be investigated further. A significant disadvantage of available results is that they often stick to a 
Gaussian assumption, something which is non always the case in practice. 

Apparently the first information theoretic result in this context analyzes the differential entropy of 
complex- valued random vectors [13]. More specifically, the maximum entropy theorem in [13] shows 
that the differential entropy of a zero-mean complex- valued random vector with given covariance matrix 
is upper bounded by the differential entropy of a circular (and, consequently, zero mean and proper) 
Gaussian distributed complex- valued random vector with the same covariance matrix. Using this result, 
capacity results for vector- valued (multiple-input multiple-output; MIMO) channels with complex- valued 
input and complex- valued output and additive circular /proper Gaussian noise have been derived [14]. 
In particular, it has been shown that the capacity-achieving input vector is Gaussian distributed and 
circular /proper. 

Let us suppose that we are dealing with a complex-valued random vector which is known to be non- 
Gaussian. In this situation, the upper bound on its differential entropy given by the mentioned maximum 
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entropy theorem turns out to be not tight. The same is the case, if the complex-valued random vector is 
known to be improper. Hence, there are two sources that decrease the differential entropy of a complex- 
valued random vector, i.e., non-Gaussianity and improperness. An important contribution of this paper 
is to derive improved/tighter maximum entropy theorems for both situations. 

The maximum entropy theorem in [13] associates a circular random vector (i.e., the Gaussian dis- 
tributed one) to a given complex- valued random vector. As pointed out, this choice does not always 
lead to the the smallest change in differential entropy. This raises the question, how we can associate 
a circular random vector to an (in general) non-circular complex-valued random vector in a canonical 
way but not forcing it to be Gaussian distributed. The choice we propose is intuitive, and is further- 
more supported by a characterization theorem that is based on a minimum Kullback-Leibler divergence 
criterion. It also leads to the desired improved entropy upper bound for the case for which the random 
vector is known to be non-Gaussian. A study of further properties complements its analysis. 

As already mentioned, the maximum entropy theorem in [13] does not yield a tight upper bound as 
well if the random vector is known to be improper. Extending our work of [1, 2], we derive an improved 
maximum entropy theorem which addresses this situation. As a by-product, we obtain a criterion for a 
matrix to be a valid complementary covariance matrix (also termed pseudo-covariance matrix [13]). We 
note that after our initial work [1, 2] the obtained characterization of complementary covariance matrices 
has been extended in [15, 16]. Meanwhile, expressions for the differential entropy of improper Gaussian 
random vectors have appeared in literature as well [3, 17]. 

Finally, we apply the obtained improved maximum entropy theorems to derive novel capacity results 
for complex- valued MIMO channels with additive noise vectors. Without making use of any Gaussian 
assumption (in contrast to [14]), we show that capacity is achieved by circular random vectors for a 
broad range of channels. These results include both the case of a deterministic channel matrix and the 
case of a random channel matrix, which is assumed to be either known to the receiver (coherent capacity) 
or unknown (incoherent capacity). On the other hand, we investigate the capacity of channels, whose 
noise is non-circular /improper and Gaussian distributed. Such channels have been shown to occur if, 
e.g., DMT is used as modulation scheme [8,9]. Note that DMT is currently employed in several xDSL 
standards [18]. We derive capacity expressions for two cases: (i) we assume that the knowledge of 
the complementary covariance matrix is taken into account (both at transmitter and receiver); (ii) we 
assume that it is erroneously believed — i.e., that the transceiver is designed assuming — that the noise has 
a vanishing complementary covariance matrix, so that the information contained in the complementary 
covariance matrix is ignored. This results in a decreased capacity and we calculate the occurring capacity 
loss. 
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Notation. The n x n identity matrix is denoted by I„. We use the superscript [•] for transposition 
and the superscript l]^ = ([•]"^)* for Hermitian transposition, where the superscript [•]* stands for 
complex conjugation, j = \f—\ denotes the imaginary unit, and are real and imaginary part, 
respectively, and E{-} refers to usual expectation. Throughout the paper, log(-) denotes the logarithm 
taken with respect to an arbitrary but fixed base. Therefore, all results are valid regardless of the chosen 
unit for differential entropy [nats or hits). 

Outline. The remainder of this paper is organized as follows. In Section 2, we introduce our framework 
and present initial results about the distribution and second-order properties of complex- valued random 
vectors. Section 3 deals with the question, how to circularize complex- valued random vectors and 
analyzes the proposed method. The differential entropy of complex-valued random vectors is addressed 
in Section 4 and two improved maximum entropy theorems are proved. Finally, in Section 5, we present 
various capacity results for complex- valued MIMO channels. 

2 Framework and Preliminary Results 

We consider complex- valued random vectors x G C". We assume that x'^'') G R^'^, where x'-''-' = 
[3?{x"^} Qjx"^}]"'" is defined by stacking of real and imaginary part of x, is distributed according to 
a joint multivariate 2n-dimensional probability density function (pdf) /x(r)(^)- More precisely, it is as- 
sumed that the measure defining the distribution of x('^) is absolutely continuous with respect to A2n, 
where A2n denotes the 2n-dimensional Lebesgue measure [19]. Accordingly, whenever an integral appears 
in this paper, integration is meant with respect to the Lebesgue measure of appropriate dimension. Note 
that when we refer to the distribution of x, we mean the distribution of x^^^ defined by the pdf f^(i){$). 
Hence, a complex-valued random vector x will be called Gaussian distributed if x'^'^^ is (multivariate) 
Gaussian distributed. 

Definition 2.1 A complex-valued random vector x E C" is said to be circular, if x has the same 
distribution as eJ^'^^x for all € [0, 1[, otherwise it is said to be non-circular. The set of all circular 
complex-valued random vectors x G C", whose distribution is absolutely continuous with respect to \2n, 
is denoted by Cu- 
lt is well known, see e.g., [1,3, 13,20], that for a complete second-order characterization of a complex- 
valued random vector x € C" not only mean vector nix — E{x} and covariance matrix Cx — E{(x— 
mx)(x — nix)^} but also complementary covariance matrix Px = E |(x — mx)(x — nix)"^} are required. 

^Here, we refer to the measure defined on the Borel cr-field on R^" induced by the measurable function defining the 
random vector. 
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Note that both mean vector and complementary covariance matrix of a circular complex-valued random 
vector are vanishing provided that its first- and second-order moments exist [3]. 

Definition 2.2 A complex-valued random vector x G C" is said to be proper, if its complementary 
covariance matrix vanishes, otherwise it is said to be improper. 

Hence, circularity implies properness (under the assumption of existing first- and second-order moments). 
Note that a zero-mean and proper Gaussian random vector is circular. 

2.1 Polar and Sheared-Polar Representation 

Here, we present some auxiliary results about the distribution of complex-valued random vectors. Let 
us denote by T^p^''^ the mapping 



(M^)" X ([0,1[)" ^ M2", 

[ri • • • r„ (/>!•• • cl)n\'^ [ri cos(27r(/)i) • • • r„ cos(27r (/>„,) ri sin(27r(/)i) • • • r„ sin(27r(/)„)]^ , 

where M.^ denotes the set of non- negative reals. There exists the inverse t('^~^p) = (T^P"^'')) ^ , provided 
that we set (pi = for rj = 0, i = 1, . . . , n. Note that the set, by which the domain of T^P"^'') is reduced 
according to this convention has measure zero with respect to X2n- In the following, x^''-* will be called 
real representation of x, whereas x^p) = T(^^p)(x('')) will be denoted as polar representation of x. 

Lemma 2.3 Suppose x S C" is a complex-valued random vector, which is distributed according to the 
pdf /x(r)(^)- Then, the pdf of its polar representation x^p^ is given by 

(27r)"(ri • • • r„)/x„ (T(P-^'-)(ri, . . . , r„, . . . , 0„)) , (n, . . . 
f^iP){ri,...,rnAi,---An) = { ...,rn,4>i,---,(pn) G (M+r X ([0,1[)" 

0, otherwise 

almost everywhere with respect to \2n (^A2n-a.e.J [19]. 
Proof. Follows from 

j f^i.mdi = j f^i^mdi = y"4w(T(p^''Ho)i^T(p-^)(oid^, 

for all Lebesgue measurable sets A C x ([0, where the Jacobian determinant Jf(p^^){C) of 

TTCp-^O is easily computed as Jj{^^r) (ri, r„, (/>!,..., <^„) = (27r)"(ri • • • r„) [21]. □ 
Observing that the pdf of y|gj of the random vector y^g) = e-'^'^^x (with G [0, 1[ being deterministic) 
satisfies / {p)(ri, . . . , r„, ^i, . . . , (/)„) = f^(p){ri, . . . ,r„, [^i - 9][o,i[, ...,[(/>„- 9][o,i[) A2„-a.e., where the 
notation [•][o,i[ is shorthand for modulo with respect to the interval [0, 1[, we obtain the following corollary. 



Corollary 2.4 A complex-valued random vector x G C" is circular if and only if the pdf of its polar 
representation x^^^ satisfies 

/x(p)(n,- • .,rn,4>i,---,(pn) = /x{p)(n, • ■ ■ ,r„, [</>i - 6'][o,i[,. . . , - 6'][o,i[) V6' G [0, 1[ A2n-a.e.. 
Let us denote by T^'^"^^) the mapping 

^(,_p) . I (Kr X ([0, ID" ^ {R+r X ([0, ID", 

[ [n ■ ■ ■ Tn (pl ■ ■ ■ (pnV ^ [n--- Tn [01 + [0,1[ ' ' ' ['/'n-1 + [0,1[ </'n] ^ , 

which is one-to-one with inverse T^p^*^) = (T^'^^p)) ^ given by 



j(p-> s) 



(M+rx([o,iD" ^ (M+r X ([o,iD", 

[ri • • • r„ (^1 • • • (^n]"^ l-> [ri • • • r„ [(^i - f^n] [0,1[ • • • [</'n-l - 4>n] [0,1[ <?^n] ^ • 

This follows immediately from the identity 

[4>\o,i[ = <l> + n{^), </.G]R, (1) 

where n((/)) G Z. In the following, x(^) = T(p-^'^)(x(p)) = T^p^*^) (T(^^p)(x(^))) will be called sheared-polar 
representation of x. 

Lemma 2.5 Suppose x G C" is a complex-valued random vector. Then, the pdfs of its polar represen- 
tation x^P^ and its sheared-polar representation x^^^ are related according to 

, fn, 0l5 • • • 5 (pn) — /x{p) (^1) • • • ) '''n, + 0n][O,l[) • • • ) [4'n-l + 0n][O,l[) 4'n) ^2n-<^-&-, 
/x(p)(^l' ■ ■ ■ ,rn,(pl, ■ ■ ■,4>n) = /x{=) (n, • • [01 - 4>n][Q,l[, • • • , [0n-l " 0n][O,l[i0n) A2„-a.e.. 

Proof. Observe that the measure defining the distribution of x^*') is absolutely continuous with respect 
to A2„, since A2n(AA) = implies A2n(T("^P)(AA)) = for ah N C {W^Y x ([0, ![)", as can be seen by 
distinction of cases according to the modulo- [0, 1[ operation. Since t(**~^P'' is not continuous on its whole 
domain, we define the auxiliary mapping T^'^^p) : x M" — )• (Md")" x W\ [ri • • ■ r„ 0i • • • 0,„]'^ i-)- 

[''1 ■ ■ ■ fn (01 + 4'n) • • • (0n-i+ 0n) 4'n\'" ^ which is one-to-one with inverse T^p^^^^ = ^T'^'^^p^^ , where 
f (P^^) : (R+)" X ^ X M", • • • r„ 0i • • • 0„]^ ^ [n • • • r„ (0i - 0„) • • • (0„-i- 0„) 0„]^. Its 

Jacobian determinant is identically Jj(s^p){ri, . . . ,r„,0i, . . . ,0„) = 1. Suppose A C (M^)" x ([0, ![)"■ is 
any set of the form A = [ai, fei[x • • • x [a2ni ft2n[- From (1) it follows that there exists a finite partition 
{Ai, . . .,An} of a, i.e., A = \J A^ and AiHAj = for i ^ j, such that T(p^^) (T(^-^p)(^)) = U ( A + 

i=l i=l 

ki), where G {0}" x Z"-^ x {0}. Here, A+kj denote the disjoint sets A+k-i = G M^" : ^-k^ G A}, 
,N. Note that the partition is caused by the modulo-[0, 1[ operation used in the definition of 



1,..., 



YI^^p) and corresponds to the required distinction of cases when investigating T^P^*^^ (T^''^p)(^)) . 
Therefore, 

f(p-^s)(T(=^p)(yl)) 



A 



T(.o^p){A) 
N 



(*) 



U / /x(p) d$ = I /x(p) + k,)) 



where (*) follows from (1) and the fact that t("^p)(^ + k^) G {W^Y x ([0, for ^ G A- This implies 
the statement, see e.g., [19]. q 
Combining Corollary 2.4 and Lemma 2.5, while applying (1), yields the following corollary. 

Corollary 2.6 A complex-valued random vector x G C" is circular if and only if the pdf of its sheared- 
polar representation x^*^^ does not depend on (j)n, i.e., 

/^(s)(ri, ... ,r„,(/)i, ...,(/)„) = /x(s)(ri, r„, (/>!,. . .,(pn-i) X2n-a.e.. 
2.2 Second-Order Properties 

In the following, we establish some results about covariance and complementary covariance matrices 
of complex- valued random vectors. For a given complex- valued matrix A G C"^"*, let us denote by 
A G M2"x2m and A G m2"x2™ the real- valued matrices 



K{A} -9{A} 
3{A} K{A} 



and A 



K{A} 9{A} 
Sj{A} -K{A} 



(2) 



This notation allows a simple expression of the covariance matrix of the real representation C^{r) of a 
complex- valued random vector x in terms of covariance matrix Cx and complementary covariance matrix 
Px as [1,3,13,20] 



CxW = ^Cx + ^Px. 

Furthermore, A, A, and A satisfy remarkable algebraic properties, as stated by the next lemma. 
Lemma 2.7 



(3) 



C = AB ^ C = AB ^ C = AB 



(4a) 
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C = AB* ^ C = AB (4b) 

C = A^ ^ C = A^ (4c) 

U G C"^" unitary ^ U G ]R2nx2n orthonormal (4d) 

detA = |detA|2 = det(AA^) , A G C"^" (4e) 

Proof. For some of the statements, see also [14]. Direct calculations yield (4a) and (4b). (4c) follows 
from the definition of A. A combination of (4a) and (4c), while observing that 1^ = l2n, yields (4d). 
Finally, for (4e), 

= det Adet A*. 

□ 

We are especially interested in the eigenvalues of P^. We will show that they are essentially given by 
the singular values of Px. Note that the singular value decomposition (SVD) [22] of a matrix A G C"^™ 
factorizes A into three matrices, i.e., A = UAV^. It is well defined for all rectangular complex 
matrices and yields unitary matrices U G C"^" and V G C"^^™ and a diagonal matrix A G M"^"^, i.e., 
A = diag"^™ |Ai, . . . , ^,nm{n,m.}} ^ with non-negative entries on its main diagonal — the singular values. 
U and V can be chosen such that the singular values are ordered in descending order. In case the 
matrix A G C"^" is symmetric (not Hermitian), i.e.. A"'" = A, there is a special SVD known as Takagi 
factorization [23]. It is given by the factorization 

A = QAQ^, (5) 

where the columns of Q are the orthonormal eigenvectors of AA^ and the diagonal matrix A has the 
singular values of A on its main diagonal. 

Proposition 2.8 Suppose x G C" is a complex-valued random vector with complementary covariance 
matrix Px G C"^". Then, there exist a unitary matrix Qx G C"^" and a diagonal matrix Ax G M"^" 
with non-negative entries, such that 

Px = QxAxQx 

represents the eigenvalue decomposition o/P^. The diagonal entries of Ax are the singular values of 
Px- In particular, A^ = diag^'"^^" {Ax, - Ax}. 

Proof. Consider the Takagi factorization (5) of the symmetric Px and apply Lemma 2.7, i.e., 
Px = Qx (AxQ^) QxAxQj = Qx Ax (Q^)* QxAxQF =^ QxAxQx , 
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det A = det 



In jT„ 

I„ 



In -jln 
In 



det 



A 

9{A} A* 



which represents the eigenvalue decomposition of P^, since is orthonormal according to (4d). q 
An interesting question that is directly related to complex-valued random vectors is the characterization 
of the set of complementary covariance matrices. For covariance matrices such a characterization is well 
known, i.e., a matrix is a valid covariance matrix, which means that there exists a random vector with 
this covariance matrix, if and only if it is Hermitian and non-negative definite. In order to obtain an 
analogous result for complementary covariance matrices, we introduce the following notion.^ 

Definition 2.9 A matrix B G ig g^^d fg generalized Cholesky factor of a positive definite 

Hermitian matrix A G C"^", if it satisfies A = BB^. 

Since det A = |detB| , a generalized Cholesky factor is always a non-singular matrix. Note that the 
conventional Cholesky decomposition (cf. [22]), A = LL^, where L is lower-triangular, yields a gener- 
alized Cholesky factor L. But there are also other ways of constructing a generalized Cholesky factor. 
Let A = UDU^ be the eigenvalue decomposition of A. For any matrix T, which satisfies D = TT^, 
B = UT is a generalized Cholesky factor. Hence, a generalized Cholesky factor is not uniquely defined. 
However, we have the following characterization. 

Proposition 2.10 Suppose B zs a generalized Cholesky factor of A. Then, for any unitary matrix U, 
C = BU is also a generalized Cholesky factor. Conversely, if B and C are generalized Cholesky factors, 
then, there exists a unitary matrix U, such that C = BU. 

Proof. For non-singular B and C we have 

BB^ = CC^ ^ (B-iC)^' = (B-^C)^, 

which implies both statements. □ 
The next theorem presents the promised criterion for a matrix to be a complementary covariance matrix. 
More precisely, it is a criterion in terms of both covariance matrix and complementary covariance matrix. 
We will call {C, P} a valid pair of covariance matrix and complementary covariance matrix., if there exists 
a complex- valued random vector with covariance matrix C and complementary covariance matrix P. 

Theorem 2.11 Suppose C G C"^" is non-singular and P G C"^". Then, {C,P} is a valid pair of 

covariance matrix and complementary covariance matrix if and only if C is Hermitian and non-negative 

definite, P is symmetric, and the singular values of B^^PB"-^ are smaller or equal to 1, where B 

denotes an arbitrary generalized Cholesky factor of C . 

^Cf. also the relation to the Karhunen-Loeve transform [24,25], also known as Hotelling transform [26], and the 
Mahalanobis transform, e.g., [27] and references therein. 
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Proof. The requirements that C is Hermitian and non-negative definite as well as that P is symmetric 
are obvious. Furthermore, observe that the singular values of B~^PB~"^ do not depend on the choice 
of the generalized Cholesky factor B. 

Suppose we are given a complex-valued random vector x € C" with covariance matrix C and com- 
plementary covariance matrix P. Consider the random vector y = B~^x. Clearly, Cy = I,„ and 
Py = B^iPB"^. From (3), i.e., Cy{o = i {hn + BzlPBzL) , and the fact that Cy{,.) is non-negative 
definite, we conclude with Proposition 2.8, that the singular values of B~^PB~^ are smaller or equal to 
1. 

Conversely, consider a complex- valued random vector y, e.g., a Gaussian distributed one, defined by 
the covariance matrix of its real representation as Cy{r) — ^ (l2n + B^PB^ ) . According to Proposi- 
tion 2.8, such a random vector exists, since Cy{r) is Hermitian and non-negative definite provided that 
the singular values of B~^PB~-^ are smaller or equal to 1. It has covariance matrix Cy = I„ and 
complementary covariance matrix Py = B~^PB~^, cf. (3). Then, the random vector x = By has 
covariance matrix Cy = C and complementary covariance matrix Py = P. q 
Remarks. Apparently, the importance of the singular values of B^^PB""^ in the context of complex- 
valued random vectors was first observed in [1] (for the above criterion and a generalized maximum 
entropy theorem) and independently in [28], where they were introduced as canonical coordinates [16, 
26,29-31] between a complex- valued random vector and its complex conjugate. Note that in [28], the 
matrix B^^PB~"^ is called coherence matrix between a random vector and its complex conjugate. Inter- 
estingly, the approach of [28] differs from the approach taken here (and taken in [1]) in that [28] employs 
a complex-valued augmented algebra to study second-order properties of complex-valued random vectors, 
whereas (2) introduces a real- valued representation into real and imaginary parts. Later, in [17], the 
singular values were also termed circularity coefficients and the whole set of singular values was referred 
to as circularity spectrum. We also note that the condition of Theorem 2.11 on the singular values, can 
be equivalently expressed in terms of the Euclidean operator norm || • ||2 as ||B-1PB-^||2 < 1. 

3 Circular Analog of a Complex- Valued Random Vector 

In this section we consider the following problem: suppose we are given a complex-valued random vector, 
which is non-circular. Can we find a random vector, which is as "similar" as possible to the original 
random vector but circular instead? Obviously, this depends on what is meant by "similar" and is, 
therefore, mainly a matter of definition. However, if we can show useful properties and/or theorems 
with this circularized random vector, its introduction is reasonable. Our approach for associating a 
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circular random vector to a (possibly) non-circular one is motivated by the well-known method used for 
stationarizing a cyclostationary random process [32]. 

Definition 3.1 Suppose x G C" is a complex-valued random vector. Then, the random vector X(g^) = 
gjSvn/)^^ yjf^Qj,^ ^ G [0, 1[ is a uniformly distributed random variable independent of x, is said to be 
circular analog of x. 

In the following, we will show that the circular analog is indeed a circular random vector. The next 
lemma expresses the distribution of X(a) in terms of the distribution of x (for both polar and sheared-polar 
representations) . 

Proposition 3.2 Suppose x G C" is a complex-valued random vector. Then, the pdfs of the polar 
representations and sheared-polar representations of x and its circular analog x^^) are related according 
to 

f^(p){n,---,rn,4>i,...,(t>n) = f^ip) {ri,. . . ,rn,[(t>i- v][o,i[,- ■ ■ ,[<Pn- f][o,ii) d(p A2n-a.e., (6) 

(a) Jo 

f^{s){ri,...,rn,(t>i,---,(t>n) = /x(s) (ri, ... ,r„,(/)i,(/)2, ...,(/>„) #n A2n-a.e., (7) 
respectively. 

Proof For (6), consider the joint pdf of x^^j and ip, i.e., f^(p).^{ri, r.„, ...,(/;„, (^) = /^{p)|^(ri, . . . 

(a) ' (a) 

. . . , r„, 01, . . . , <j)n\^) f^,{ip) = /^(p) (ri, . . . , r„, [^i - V?][o,i[, ■■■,[^n- ^][om) and marginalize with respect 
to if. (7) follows from (6) using Lemma 2.5 and identity (1). q 
Observe that / (s) does not depend on A2n-a.e., so that Corollary 2.6 implies circularity of x/^). 

3.1 Divergence Characterization 

Here, we present a characterization of the circular analog of a complex- valued random vector that further 
supports the chosen definition. It is based on the Kullback-Leibler divergence (or relative entropy) [33, 34], 
which can be regarded as a distance measure between two probability measures. For complex-valued 
random vectors, whose real representations are distributed according to multivariate pdfs, the Kullback- 
Leibler divergence Z)(x||y) between x G C" and y G C" is defined as 

Z)(x||y) ^Z)(x«||y«) = / f^,,,(^^)log i^^d^ G IR+ U {oo}, 

where we set OlogO = and Olog^ = (motivated by continuity). Here, D(x||y) is finite only if the 
support set of /^(r) is contained in the support set of /y(r) A2n-a.e.. Note that D(x||y) = if and only if 
/x(r) — /y('-) A2n-a.e. [33]. The next lemma shows that L'(x||y) can be equivalently expressed in terms of 
polar and sheared-polar representations. 
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Lemma 3.3 Suppose x € C" and y G C" are complex- valued random vectors. Then, the KuUback- 
Leihler divergence D(x||y) can he computed from the respective polar and sheared-polar representations 
of X and y according to 



L>(x||y)=D(x(P)||y(P)) = y"/,(p)(^)log 



/x(p)(0 

/y(p)(^) 

+ )"x([0,l[)" 



d^ 



J /y(s) l€j 

{R+)"x([0,l[)" 



Proof With A = (M([)" x ([0, 



o(x")||y'") = //,,., (fl log |£|><;« = / u,, (T(p-)«))iog|l|I^rg|j^,„,(0|,4 



/x(P)(01ogf^rf^ = i?(x(P)||y(P)), 

/y(p) j 



where Lemma 2.3 has been used. Furthermore, using the mapping T^'^^p) and the appropriate partition 
{^1, . . . ,An} of A, cf. the proof of Lemma 2.5, 

^(x(P)||y(P)) = (0 log f^d^= //.(P) -)(0)log ^^'^' ^^y^^^^^h ^ 



T(s^P)(^) T(P^=)(T(=->-P)(yt)) 

U / /x(p) (T(^^'')(^ + k.)) log 



where Lemma 2.5 has been used. □ 
We intend to prove a theorem, which states that the circular analog has a smaller "distance" from the 
given complex-valued random vector than any other circular random vector. To that end, consider the 
sheared-polar representation of x, i.e., x^'^-' G M^'", and form the "reduced" vector x^^-* G M^"~^ by only 
taking the first 2n — 1 elements of x^**^ Clearly, its pdf is given by marginalization, i.e., 

/xw(0=/ /x(B)(n, • • • ,r-„,(/)i, . . . ,(/>n)#n, where i = {ri, . . . ^rnAi, ■ ■ ■ ,4>n-i)- 
Jo 

Furthermore, let 5x C x ([0, denote the support set of f^(s). Note that f^(s){i) = is 

equivalent to /^(s) (^, 0n) = Ai-a.e. (for fixed We have, 

/x{=) it <Pn) = /x(B) (0 /^|x(=) (-^n |^~) , ^~ G 5x, <^n G [0, 1 [, (8) 

where = {^^^^)2n element of x^'^^ 
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Theorem 3.4 Suppose x G C" is a complex-valued random vector. Then, a circular random vector 
y G C" is the circular analog of x, i.e., y = X(a), if and only if it minimizes the Kullhack-Leihler 
divergence to x & C" within the whole set of circular random vectors, i.e., if and only if 



D{x\\y) = inf D{x\\c). 

C^Cn 



Furthermore, 



Z?(X||X(,)) = m^f^^(x||c) = I ^|x(=)('/'n|01og/^|x(=)('/'n|0#n) di ^ h{'d\x^^^), 

where h(-d\x.^^^) denotes the conditional differential entropy of d given x^^\ cf. [33] and Definition 4.2, 
with "Q and according to (8). 

Proof. Suppose c S C„ and consider its sheared-polar representation c^'^) G M^*^. Due to the circularity 
of c, /(,(s)(^) = /c{s)(^) A2n-a.e., and, according to Lemma 3.3, 



^(x||c) = / /i(s)(^)/^|x(B)((/'„|01og — 



= i)(x(^) ||c(^)) + I /^(., (0 ^lx(s) (0n|^~) log ^|x(=) (<AnlO#n) ^i^, 



5x 



where c^®^ G M^" ^ is the corresponding "reduced" vector of c^'^^ For the validity of (*), we also refer to 
[34, Theorem D.13]. It follows that 

D{x\\c) = I /-(.) (I) (^^' {4>n\i) log (0„|O#n) (9) 

and the infimum is achieved for /-(s) = /^{s) A2n-i-a-e.. Since for the circular analog x/^) of x, / (s) (^) = 
/x(s)(^) A2n-a.e., and since /c(s)(^) = fc(s){^) A2n-a.e., the infimum is achieved if and only if /^.{s) = 
4w A2n-a.e., i.e., c = X(a). □ 



3.2 Complex- Valued Random Vectors with Finite Second-Order Moments 

hi this section, we establish important properties of the circular analog X(a) of a complex-valued random 
vector X, whose second-order moments exist. Clearly, both mean vector and complementary covariance 
matrix of x^^) are vanishing. For the covariance matrix, we have the following result. 
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Theorem 3.5 Suppose x € C" is a zero-mean complex-valued random vector with finite second-order 
moments. Then, the covariance matrix of the circular analog X(-a) equals the covariance matrix of x, 



i-e., Cx(^) — Cx- 



Proof. For the correlation between the kth and Zth entry of X(a) , 

E I (x(,))^ (x(,));| = / jCk+nm - jii+n)f^i.) 



(*) 



(6 + jik+n){il - j^l+n)f^Mi^{^\ip)dipd$, 
/ / iCk+ jCk+n){Cl - j(,l+n)fy,(r)\J^\^)d^dip 

Jo yiR2n 

E { (e^-^'^^x), (e^-2-v^x);} dc^ = E {(x), (x);} , 



where ip denotes the uniformly distributed random variable used for defining x^^) (see Definition 3.1) 
and (*) follows from Fubini's Theorem [21]. q 
The following theorem states that the circular analog of an improper Gaussian distributed random vector 
is non-Gaussian. 

Theorem 3.6 Suppose x G C" is a zero-mean, complex-valued, and Gaussian distributed random vector 
with ||B~-'^Px B~-^||2 < 1 such that its circular analog X(-a) is Gaussian distributed. Here, Bx denotes 
a generalized Cholesky factor of the covariance matrix Cx of x and Px denotes the complementary 
covariance matrix of x. Then, x is proper. 

Proof. We first prove the theorem for the special case Cx = In and Px = -^xi where Ax G M"^" denotes 
a diagonal matrix with non-negative diagonal entries Aj < 1. For fixed (deterministic) 9, consider the 
random vector y(0) = e-'^'^^x, which has covciricince matrix Cy^^^ — Itt, cuid coixipl6ni6ntajry covajriancG 
matrix Py^^^ = e-^^^^Ax- According to (3), the covariance matrix of its real representation y|^| is given 

by 



{■■) 
'(f) 



In 


1 




+ 2 


I„ 





cos(47r6')Ax sin(47r0) A> 
sin(47r0) Ax - cos(47r0) A> 



whose determinant is easily computed as 

n 

detC „ =2-2"JJ(l_Af). 



i=l 



Furthermore, its inverse is calculated as 



C (r) 



(In-A^)"' 

fl„ 



cos(47r6l)D, 
sin(47r6l) Dx 



sin(47r6l) 
cos(47r6') By 
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where Dx = Ax (!« — A^) ^ . Therefore, the pdf of y^^l is given by 



(^) 



1 



'(6) 



■ exp 



T 



i=l 



id) 

I„-A2)-^ 

(In-A2)-1 



X exp ^ 

Since /^(r) (^) = /g^ /^(o (^)(i6' A2n-a.e., we obtain 

(In-A^)-^ 



cos(47r6l)Dx sin(47r6l)D> 
sin(47r6l)Dx - cos(47r6l) D> 



A2n-a.e.. 



(a) 



'(0) 



f^M (^) = 



i=l 




fi„,-A2r' 



^ X 



D, 



Dx 



(10) 




A2n-a.e. 



where Io{x) = exp (x cos{2tt9)) d6 is the modified Bessel function of the first kind of order zero [35]. 

Here, we have used the identity 

1 .1 
exp (acos(27r6') + 6sin(27r6')) ^6* = / exp (r cos(27r6'o) cos(27r6') + r sin(27r6'o) sin(27r6')) 

JO 

1 

exp {rcos{2-K{e -eQ))d9 





/o 



where (r, 9q) denotes the polar coordinates of the complex number (a+jb). According to the assumptions 
of the theorem and to Theorem 3.5, x^^) is Gaussian distributed with covariance matrix Cx^j^j = In and 
vanishing mean vector and complementary covariance matrix. Hence, (10) implies Dx = and, therefore, 
the the statement. 

For the general case, apply the Takagi factorization to Bx^PxE"^, i.e., Bx^PxB"^ = QxAxQxj 
and consider the random vector y = Qx^Bx^x. Clearly, Cy = I„ and Py = Ax, and both y and y(a) 
are Gaussian distributed. From the special case, Py = 0, and, in turn, Px = BxQxPyQx^x =0- □ 



4 Differential Entropy of Complex- Valued Random Vectors 

As outlined in the introduction, we are interested in bounds on the differential entropy of complex-valued 
random vectors. We start with a series of definitions, which are required for the further development of 
the paper. Again, we make use of the convention OlogO = and Olog § — 0. 
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Definition 4.1 The differential entropy ^(x) of a complex-valued random vector x G C" is defined as 
the differential entropy of its real representation i.e., 

/i(x) ^ /i(xW) ^ - I /,(o(^)log4w(^)d^, 
provided that the integrand is integrable [19]. 

Definition 4.2 The conditional differential entropy h{x.\y) of a complex-valued random vector x € C" 
given a complex valued random vector y G C" zs defined as the conditional differential entropy of the 
real representation x^''^ given the real representation y^^\ i.e., 

/.(x|y)4/i(xW|yW)4- / f^^^yi^M, v) log ^""Y^f:"^^ d^dr,, 

J Jy(0 [V) 

provided that the integrand is integrable. Here, /x(r)-y(r) ^) denotes the joint pdf of x^^^ and y^^\ 
whereas /y(r) {rj) denotes the marginal pdf of y^^^ . 

Definition 4.3 The mutual information /(x; y) between the complex-valued random vectors x G C" and 
y G C™ is defined as the mutual information between their real representations x^'') and y^^\ i.e., 

/(x;y)4/(xW;y«)4 / UyM^,v)log ^^^^^^j^^ 

where /x(r)-y(r) J7) denotes the joint pdf of x^^^ and y^^\ and f^(i-){$) and /y(r)(?7) are the marginal 
pdfs of x^^^ and y^^\ respectively. 

It is well known that these quantities satisfy the following relations, 

/(x; y) = h{x) - h{x\y) = h{y) - h{y\x) = h{x) + h{y) - h{x, y), (11a) 
/(x;y)>0, (lib) 

with equality in (lib) if and only if x and y are statistically independent. Furthermore, according to 
the next theorem, Gaussian distributed proper random vectors are known to be entropy maximizers. 

Theorem 4.4 [Neeser & Massey] Suppose x G C" is a zero-mean complex-valued random vector with 
non-singular covariance matrix Cx- Then, the differential entropy of x satisfies 

/i(x) < logdet(7reCx), (12) 
with equality if and only if x is Gaussian distributed and circular /proper. 
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Proof. See e.g., [13, 14]. 



□ 



Remarks. Let us assume, for the moment, that x is known to be non-Gaussian. Clearly, the inequality 
(12) is strict in this case and log det(7reCx) is not a tight upper bound for the differential entropy /i(x). 
Similarly, if x is known to be improper, the differential entropy /i(x) is strictly smaller than log det(7reCx). 
Loosely speaking, there are two sources that decrease the differential entropy of a complex- valued random 
vector: non-Gaussianity and improperness. In the following, we will derive improved maximum entropy 
theorems that take this observation into account. While their application is not limited to the non- 
Gaussian and improper case, the obtained upper bounds are in general tighter for these two scenarios 
than the upper bound given by Theorem 4.4. 

4.1 Maximum Entropy Theorem I 

We first prove a maximum entropy theorem that is especially suited to the non-Gaussian case. However, 
also for Gaussian distributed random vectors the obtained upper bound will turn out to be tighter 
than the one of Theorem 4.4. It associates a specific circular random vector to a given random vector 
and upper bounds the differential entropy of the given random vector by the differential entropy of the 
associated circular random vector. 

Theorem 4.5 (Maximum Entropy Theorem for Complex- Valued Random Vectors I) Suppose x G C" 
is a complex-valued random vector. Then, the differential entropies of x and its circular analog X(a) 



satisfy 



/l(x) < h (X(a)) 



with equality if and only if x is circular. 



Proof. Since X(a) = & 



j'^TTip^ with ip independent of x, we have for fixed (deterministic) ip 



/i(x(,)|V' = (^) = Me^''"'^x) = /i(x), 



and, furthermore, by applying Fubini's theorem to Definition 4.2 




Therefore 



^(X(a)) - ^(x) = /i(x(a)) - /i(x(a)|V') = -?'(x(a);V') > 0, 



(13) 
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where we have used (11a) and (lib), x^^-^ and ^p are mdependent, i.e., h{x(^^-j) = /i(x), if and only if x|^^^ 
and are independent. To investigate this independence, consider the joint pdf of x|^j and -0, i.e., 

4(b). ,(n, ... ,r„, (/>!,. . .,4>n,^) = f^('^)\^in, ,rn,(pl,.. . ,(j)n\V>) Ui^) 

= /xW {ri,...,rn,(pl,.. ■,(pn-l, [4>n " V>][0,1[) 
= /x(B)(0^|x(=) ([<^n - y'lldlll^ 

where i = (ri, . . . , r„, c^i, . . . ,(/>„_i) G <Sx, 5x C (Kq )" ([0' ID"""^ being the support set of /-(e), cf. (8). 
Since / (s) {i,(pn) = /x(s)(0 A2n-a.e. on 5x x [0, 1[ and f^{^p) = 1 Ai-a.e. on [0, 1[, independence of 'x.^^X 
and ip is equivalent to 

/i9|x(=) {\^n - ^][{),i[\tj = 1 A2n+i-a.e. on [0, lpx6x as function of (</>„, Lp,i). (14) 

Transforming both sides of this equation according to (j)'^ = [4>n — ^][o,i[ and = a similar partitioning 
argument as in the proof of Lemma 2.5 shows that (14) is equivalent to 

/i»|x(B) {j^'n\i) = 1 A2n+i-a.e. on [0, lpx5x as function of (^^,99',!^). (15) 

Marginalization of both sides of (15) with respect to ip' yields 

/i»|x(=) {j^'n\tj = 1 A2n-a.e. on [0, l[x5x as function of (</)^,|), 

so that — according to Corollary 2.6 and (8) — equality /i(x(a)) = ^(x) implies circularity of x. The 
converse statement follows from Theorem 3.4. q 
Remarks. Since X(j^) is non-Gaussian in general,^ the upper bound in Theorem 4.5 is typically tighter 
than the upper bound in Theorem 4.4. Furthermore, Theorem 4.5 does not need the requirement of 
finite second-order moments. The next corollary states that for improper Gaussian distributed random 
vectors the upper bound in Theorem 4.5 is strictly smaller than the upper bound in Theorem 4.4. 

Corollary 4.6 Suppose x € is a zero-mean, complex-valued, and Gaussian distributed random vector 
with non-singular covariance matrix Cx, such that ||Bx"'^PxBx"^||2 < 1, where Bx denotes a generalized 
Cholesky factor of Cx and Px denotes the complementary covariance matrix of x. Then, the differential 
entropy of its circular analog X(a) satisfies 

h (x(a)) < logdet(7reCx), 
with equality if and only if x is proper. 



^The following technical derivation is required in order to show identical distributions of e-'^'^*x for all 9 £ [0, 1[ and not 
only for 9 Ai-a.e. on [0, 1[. 

*Note that it is possible to define an improper (non-Gaussian) random vector, such that its circular analog is Gaussian 
distributed. In this case, Theorem 4.5 does not yield an improvement over Theorem 4.4. 
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Proof. Since x^j^^ is zero-mean with covariance matrix Cx^^j = Cx, cf. Theorem 3.5, the inequahty 
follows from Theorem 4.4. Furthermore, equality h (x(a)) = log det(7reCx) implies Gaussianity of x^^). 



and, according to Theorem 3.6, properness of x. q 
4.2 Maximum Entropy Theorem II 

Here, we prove a maximum entropy theorem that is especially suited to the improper case. The derivation 
is based on a maximum entropy theorem for real-valued random vectors. 

Theorem 4.7 (Maximum Entropy Theorem for Real- Valued Random Vectors) Suppose x G M" is a 
real- valued random vector with non-singular covariance matrix Cx. Then, the differential entropy of x 
satisfies 



with equality if and only if x is Gaussian distributed. 

Proof. For the proof of this theorem for x being zero- mean see e.g., [33]. The general case, where x has 
a non-vanishing mean vector, follows immediately since both differential entropy and covariance matrix 
are invariant with respect to translations. □ 
We are now able to state the main theorem of this section. 

Theorem 4.8 (Maximum Entropy Theorem for Complex- Valued Random Vectors II) Suppose x G C" is 
a complex-valued random vector with non-singular covariance matrix Cx, such that ||B~^PxB~-^||2 < 1, 
where Bx denotes a generalized Cholesky factor of Cx and Px denotes the complementary covariance 
matrix of x. Then, the differential entropy of x satisfies 



where Aj are the singular values o/B^^ PxB^ , with equality if and only if x is Gaussian distributed. 
Proof. According to Theorem 4.7, 

^(x) < Jlogdet(27reC^„) 





i=l 



(4a), (4c) 



(3) 
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=^ logdet(7reCx) + ^ logdet (l2„ + B^^P^B^^) 
log det(vreCx) + \ log det (l^^ + B^^PxB^) 

= logdet(^eCx) + ^logn(l-A,'), 

i=l 

where the last identity follows from Proposition 2.8 applied to the random vector y = B^^x. Note that 
Py = B~^PxB~"^. We also conclude from the last expression that the non-singularity of 0^(1), which is 
required for the application of Theorem 4.7, is a direct consequence of the assumption ||B~^PxB~-^||2 < 
1, since Aj < ||B3^-'^PxB3^-^||2. The equality criterion is obvious. □ 

n 

Remarks. Note that ^ ^ log(l — A?) < with equality if and only if x is proper, so that Theorem 

i=l 

4.8 implies Theorem 4.4. The upper bound in Theorem 4.8 is the differential entropy of a Gaussian 
distributed but in general non-circular/improper random vector with same covariance matrix and com- 
plementary covariance matrix as x, whereas the upper bound in Theorem 4.5 is the differential entropy 
of a circular but in general non-Gaussian random vector with same covariance matrix as x. Which of the 
two bounds is tighter depends on the situation, i.e., on the degree of improperness and non-Gaussianity; 
a general statement is not possible. However, for an improper Gaussian distributed random vector x, 

1 

logdet(7reCx) + i^Y^ log(l - >h) < ^(x(a)), 

i=l 

whereas for a circular non-Gaussian random vector x, 

1 

/i(x(a)) < logdet(7reCx) + - ^log(l - Af) = logdet(7reCx). 

^ i=i 

5 Capacity of Complex- Valued Channels 

In this section we study the influence of circularity/properness — non-circularity/improperness on channel 
capacity. In particular, we investigate vector-valued (MIMO) channels with complex-valued input and 
complex-valued output. For simplicity, we only consider linear channels with additive noise, i.e., channels 
of the form 

y = Hx + z, (16) 

where x S C™", y G C", and z E C" denote transmit, receive, and noise vector, respectively, and 

H G (C"x™ is the channel matrix. Both^ x and z are modeled as iid (only with respect to channel uses; 

^Without loss of generality, since an iid (with respect to channel uses) x is capacity achieving if z and H (if applicable) 
are iid. 



20 



within the random vectors the iid assumption is not made) vector-valued random processes, whereas H 
is either assumed to be deterministic or is modeled as an iid (again, only with respect to channel uses) 
matrix-valued random process. Furthermore, x, z, and H (if applicable) are assumed to be statistically 
independent. Note that the assumption of a Gaussian distributed noise vector z is only made for the 
special case investigated in Section 5.3 but not in general. The channel is characterized by the conditional 
distribution of y given x via the conditional pdf /^(r) |x(r) ('71^) of their real representations y^''^ given x^''^ , 
as well as by a set X of admissible input distributions. We write x G X, if the distribution of x defined 
by the pdf /^(r) is in X. Then, the capacity /noncoherent capacity of (16) is given by the supremum of 
the mutual information over the set of admissible input distributions [36], i.e., by 

C = sup7(x; y). 

If, for the case of a random channel matrix, it is additionally assumed that the channel realizations are 
known to the receiver (but not to the transmitter), the channel output of (16) is the pair 

(y,H) = (Hx + z,H), (17) 

so that the channel law of (17) is governed by the conditional pdf /y(r).H(r)|x(o(^)Xl^)) where H^'') is 
defined by an appropriate stacking of real and imaginary part of H. Therefore, the coherent capacity of 
(17) is given by 

Cc = supI(x;y,H) = sup / /(x«;yW|HW = x)fui^)ix)dx, 
xex xe2 J 

where /^(r) ix) denotes the pdf of H^''^ and Fubini's Theorem has been used. A random vector x G I is 
said to be capacity- achieving for (16) or (17), if I(x;y) = C or /(x;y, H) = Cc, respectively. 

5.1 Circular Noise Vector 

Here, we assume that the noise vector z G C" is circular and that X is closed under the operation of 
forming the circular analog, i.e., that x G X implies x^^^j G X — in the following shortly termed circular- 
closed. Note that this closeness assumption is a natural assumption, since the operation of forming 
the circular analog of the first kind preserves both peek and average power constraints, cf. Theorem 
3.5, which are the most common constraints for defining X. If z is Gaussian distributed, it has been 
shown in [14] that capacity (for deterministic H) and coherent capacity (for random H) are achieved 
by circular (Gaussian distributed) random vectors, respectively. The proofs are based on Theorem 4.4. 
The following Theorems 5.1 and 5.3 extend these results to the non-Gaussian case. 
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Theorem 5.1 Suppose for (16) a deterministic channel matrix H G C"^™, a circular noise vector 
z G C", and a circular-closed set I of admissible input distributions. Then, there exists a circular 
random vector x G C™" that achieves the capacity of (16). 

Proof. Let us denote by x' G X a — not necessarily circular — capacity-achieving random vector. Accord- 
ing to (11a), its circular analog x = x'^^^ = e^'^'^^'x.' G I, where £ [0, 1[ is uniformly distributed and 
assumed to be independent of x' and z, satisfies 

/(x;y) = My)-/i(y|x) 

= h (Hx + z) -h (Hx + z|x) 

= h (He^'^^'^x' + z) - /i (z) 

= h (e^'2"'^(Hx' + e-^'^-'^z)) -h{z). 

Note that z^^) = e~^'^'^^z = z, cf. Theorem 3.4, and that Zj-^) is independent of ip, according to (13). 
Therefore, 

/(x;y) =/i((Hx' + z)(^)) -/i(z) 

(*) 

> h (Hx' + z) -h (z) 
= /(x';Hx' -Fz) 
= C, 

where (*) follows from Theorem 4.5. Hence, the circular x is capacity-achieving. q 

Theorem 5.2 Suppose for (16) a random channel matrix H G C"^™, a circular noise vector z G C", 
and a circular-closed set I of admissible input distributions. Then, there exists a circular random vector 
X G C" that achieves the noncoherent capacity of (16). 

Proof. Let us denote by x' G X a — not necessarily circular — capacity-achieving random vector and let 
X(g) = e-'^'^^x' (with G [0, 1[ being deterministic). With y^g^ = Hx(g) + z we obtain, 

^(x(0);y(e)) = /i (y(e)) - /i (y(e)|x(0)) 

= (Hx(0) + z) - /i (Hx(e) z|x(0)) 

/, (e^-2'^^(Hx' + e-^-^'^^z)) - | h (Hx(,) + z|x« = ^) 4w (O'i^ 
= (e^-2-^(Hx' + e-^-2-^z)) - | /i (ue^'-'^' + zjx'^^^ = ^) 4„.)(0d^ 
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where (*) follows from Fubini's Theorem. Since the differential entropy of a complex-valued random 
vector is invariant with respect to a multiplication with e-'^'^^, 

I {^{ey,y(e)) = h (hx' + e~^^^'z) - jh (hx' + e-^^-^zjx'^--) = i) 4,,) 
h (Hx' + z)- Jh (Hx' + zjx'^-") = ^) 4,(0 (^)d^ 
= /(x';y'), 

where y' = Hx' + z and (*) follows from the circularity of z. Hence, Xj-g) is capacity-achieving. It is well 
known that the mutual information is a concave function with respect to the input distribution for fixed 
channel law [36]. Therefore, by Jensen's inequality [37], the random vector x G C™ with distribution 
defined according to /x(r)(^) — Jq f (r) {$)d6 A2n-a-e. satisfies 

/(x;y)> f I{^(ey,y[e))de= r/(x';y')d0 = /(x';y'), 
Jo Jo 

so that X achieves the noncoherent capacity of (16). But / (r) (^) = / ,(r)| {^\0) A2n-a.e., where ip de- 
notes the uniformly distributed random variable used for defining x'^^) (see Definition 3.1), and, therefore, 
X = x'(a) G X. □ 



Theorem 5.3 Suppose for (16) a random channel matrix H G C"^™, a circular noise vector z G C", 
and a circular-closed set X of admissible input distributions. Then, there exists a circular random vector 
X G C" that achieves the coherent capacity of (17). 

Proof. Let us denote by x' G X a — not necessarily circular — random vector that achieves the coherent 
capacity of (17). Using the same line of arguments as in the proof of Theorem 5.1, its circular analog 
X = x'^^^| = e^'^'^^x.' G X, where € [0, 1[ is uniformly distributed and assumed to be independent of x', 
z, and H, can be shown to satisfy 

/(xW;yW|HW = x) > /(x'«;y'«|HW = x), 

where y = Hx + z and y' = Hx' + z. It follows that 

/(x;y,H) = I /(xW;yW|HW =x)/H(0(x)fiX 
> I /(x'«;y'W|HW=x)/Hw(x)dX 
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/(x';y', 



H) 



i.e., X achieves the coherent capacity of (17). q 
5.2 Circular Channel Matrix 

Here, we assume that the channel matrix H G c^x"* ig random, and — additionally — that an arbitrary 
stacking of the elements of H into an nm-dimensional vector yields a circular random vector. The noise 
vector z is not required to be circular. Note that this is the opposite situation compared with Section 
5.1, where z is circular but H is arbitrary. Again, it is assumed that the set X of admissible input 
distributions is circular-closed. For the input distributions that achieve the noncoherent capacity of (16) 
and the coherent capacity of (17), respectively, we have the following results. 

Theorem 5.4 Suppose for (16) a random channel matrix H S ([^nxm^ such that the random vector, 
which is obtained from an arbitrary stacking of the elements of H into an nm-dimensional vector, is 
circular, and a circular-closed set X of admissible input distributions. Then, there exists a circular 
random vector x € C"* that achieves the noncoherent capacity of (16). 

Proof. Let us denote by x' E X a — not necessarily circular — random vector that achieves the noncoherent 
capacity of (16), and let x = x|^^ = e-'^'^'^x' G X, where Tp ^ [0, 1[ is uniformly distributed and assumed 
to be independent of x', z, and H, be its circular analog. We have. 



I(x; y) = h (Hx + z) - /i (Hx + z|x) 



/(x'; y') = h (Hx' + z) - h (Hx' + z|x' 



) 



and intend to show /(x;y) = /(x';y' 



). Due to the circularity of H, Theorem 3.4 implies, 




so that it remains to show h (Hx + z|x) = h (Hx' + z|x'). Fubini's Theorem yields 




since /,,w(^) = Jq f^(r)\^{^\ip)dip 



/o f{e32^v:x.')(.':)i^)dv, and, furthermore. 
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j jh (Hx' + z\^'^'^ = ^) {^)d^d^ 
1 

h (Hx' + z|x') dv? 
= h (Hx + z|x') , 

where (*) fohows from the circularity of H. Hence, the circular x achieves the noncoherent capacity of 
(16). □ 

Theorem 5.5 Suppose for (16) a random channel matrix H G C"^™, such that the random vector, 
which is obtained from an arbitrary stacking of the elements of H into an nm- dimensional vector, is 
circular, and a circular- closed set X of admissible input distributions. Then, there exists a circular 
random vector x G C™" that achieves the coherent capacity of (17). 

Proof. Let us denote by x' G X a — not necessarily circular — capacity-achieving random vector and let 
X(5i) = e^^'^^yi' (with 9 £ [0, 1[ being deterministic). With y^g) = Hx^g) + z we obtain, 

/(x(,);y(,),H) = J /(xg;yW|HW = x)/HM(x)f^X 

= y (/i (Hx(e) + z|HW = x)-/i((Hx(,)+z|x(,))|HW = x))/H(.)(x)^ix 

= y (/i (He^-2-V + z|HW=x)-/i((He^''"V + z|x')|HW = x))/Hw(x)t^X 

= y (/i (Hx' + z|hW =x)-h ((Hx' + z|x') |hW = x)) /(e.2.«H)W (x)dx, 

where Fubini's Theorem has been used, and, furthermore, due to the circularity of H, 

/(x(,);y(,),H)= y /(x'W;y'«|HW=x)/HM(x)fiX = /(x';y',H), 

where y' = Hx' + z. Hence, x^^/) is capacity-achieving. Therefore, by applying Jensen's inequality to the 
concave mutual information function (with respect to the input distribution, cf. the proof of Theorem 
5.2), the random vector x G C"* with distribution defined according to /x{r)(^) — Jn f M {^)dO A2n-a.e. 
satisfies 




the uniformly distributed random variable used for defining x'(a^) (see Definition 3.1), and, therefore, 
X = x'(a) G X. □ 
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5.3 Deterministic Channel Matrix and Improper Gaussian Noise Vector 

Here, we investigate the case that the channel matrix H € ^nxm jg deterministic and that the noise 
vector z G C" is Gaussian distributed. We impose an average power constraint, i.e., we define the set of 
admissible input distributions as 

{x : E{x^x} < S}. (18) 

For z proper, both capacity and capacity-achieving input vector are well known [14]. Therefore, in the 
following, we consider a more general situation without the assumption of z being proper. However, 
we introduce additional technical assumptions, which make the derivation less complicated and lead to 
simpler results. Note, that most of these assumptions could be significantly relaxed or even omitted, but 



for the price of more involved theorems and proofs. We assume, 

H G C^x" deterministic, quadratic, and non-singular, (19a) 

S > 2n IIH^^CzH^^II^ (high signal-to-noise ratio), (19b) 

z zero-mean with non-singular € C"^", (19c) 

||B-ip,B,^||2 < 1, (19d) 



where and Pz denote covariance matrix and complementary covariance matrix of z, respectively, and 
Bz is a generalized Cholesky factor of C^. We have the following capacity result. 

Theorem 5.6 Suppose for (16) that assumptions (19) hold and that the set of admissible input distri- 
butions is defined according to (18). Then, the capacity of (16) is given by 

n 

C = 21og |det H| + nlog {S + tr {H-^CzH-^}) - logdet Cz - - ^log(l - A^) - nlogn, 

i=l 

where tr{-} denotes the usual matrix trace and Aj are the singular values of B~"^PzB~'^. Furthermore, 
the zero-mean and Gaussian distributed random vector x G C" with covariance matrix and complemen- 
tary covariance matrix given by 

Cx = - (5 + tr {H-iCzH-^}) I„ - H-iCzH-^, (20a) 
n 

Fx = -H-iPzH-^, (20b) 
respectively, is capacity- achieving. 

Proof. Since E |x-^x} = tr {Cx} + ||mx||2, where nix denotes the mean vector of x, Theorem 4.4 implies 
that the supremum of 

/(x; y) = h{y) - h (y|x) = /i (Hx + z) - /i (z) 
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over X is achieved by a zero-mean and Gaussian distributed complex-valued random vector x with 
covariance matrix Cx that maximizes the function g(G) = logdet (HCH^ + Cz) over the set of co- 
variance matrices C with tr{C} < S, and with complementary covariance matrix Px that satisfies 
HPxH^ + Pz = 0) provided that such a random vector exists. Using the eigenvalue decomposition 
H-^CzH-^ = UDU^ we obtain, 

5 (C) = 2 log |det H| + log det (C + H-^CzH^-^) 
= 2 log |det H| + log det (C + UDU^) 
= 2 log |det H| + log det (U^CU + D) , 

so that its maximum is achieved at C = Cx — U — D) = LI„ — H~^CzH~^, where L is chosen 
such that 

S = tv {Cx} = Ln - tr {H-^CzH"^} (21) 

is satisfied. Note that this is the well-known water filling solution [14,33], with the additional simplifi- 
cation that Cx is non-singular,^ since, according to (19b), 

S 

— > 2 H~ CzH~ = 2 IIDL = 2(imax > ^max, (22) 

n ^ 

where dmax is the largest entry (eigenvalue) of D. This yields (20a). Clearly, the choice (20b) satis- 
fies HPxH"^ + Pz = 0. It remains to show that {Cx,Px} is a valid pair of covariance matrix and 
complementary covariance matrix. To that end, consider 

||Px||2 = ||H-iPzH-^||2 (23a) 
= ||H-iBz (B, iPzB, ^) BjH-^ll^ 

(*) II 1 ii2 

< ||H-^Bz||2 

= ||H-iCzH-^||2 

= I|D||2 

= C^max, (23b) 

where (*) follows from Theorem 2.11, and note that L > ^ > 2(iniaxi cf. (21) and (22). This implies, 

1 1 -fx 1 1 2 ^ ^ f^max 



ICx^ 



''The water level L is larger than the noise power for all (parallel) eigenchannels 
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and, furthermore, 

SO that Theorem 2.11 shows that (20) defines a vahd pair of covariance matrix and complementary 
covariance matrix. Finally, the capacity of (16) is obtained as 

C = 5(Cx) + nlog(^e)-/i(z) 

/I \ 1 " 

= 21og|detH| +nlog ( - (S + trjH-iC^H-^}) j + n log(^e) - log det(7reCz) - -^log(l - Af) 

^ i=i 
1 " 

= 2 log |det H| + nlog (5 + tr {U^^C^H^^}) - n log n - log det - - ^ log(l - A^), 

i=\ 

where Theorem 4.8 has been used. q 
Remarks. Whereas in many real-world scenarios the noise vector happens to be circular, so that the 
results of Section 5.1 apply, there are also practically relevant scenarios, where the noise vector is known 
to be improper. More specifically, DMT modulation, which is widely used in xDSL applications [18], 
yields an equivalent system channel that exactly matches the situation considered here [8,9]. We also 
note that capacity results for improper Gaussian distributed noise vectors could be alternatively derived 
by making use of an equivalent real- valued channel of dimension 2n x 2m that is obtained by appropriate 
stacking of real and imaginary parts. The advantage of the approach presented here is that it yields 
expressions that are explicit in covariance matrix and complementary covariance matrix. In the following, 
we make use of this (desired) separation. 

n 

Observe that improper noise is beneficial since — due to i ^ log(l - Af ) < 0— it increases capacity. 

■i=\ 

However, this presupposes a suitably designed transmission scheme. If it is erroneously believed that 
Pz = 0, it will be erroneously believed as well (see Theorem 5.6) that the zero-mean and Gaussian 
distributed random vector x' with covariance matrix Cx' = Cx as in (20a) but with Fx' = is capacity- 
achieving. It follows that 

a = /(x';y') = C-AC<C, 

where y' = Hx' -|- z and AC = C — C denotes the resulting capacity loss. This capacity loss is quantified 
by the next theorem. 

Theorem 5.7 Suppose for (16) that assumptions (19) hold and that the set of admissible input distri- 
butions is defined according to (18). Then, the capacity loss AC that occurs if it is erroneously believed 
that Pz = is given by 

1 " 

AC = --5]log(l-/i?), 

1=1 
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where fn are the singular values of [n/ (5' + tr|H ^CzH ^})) H ^PzH . In particular, 

2 

< AC < nlog^. 

v3 

Proof. We intend to apply Theorem 4.8 to the random vector y' = Hx' + z. In order to meet the 
assumption of Theorem 4.8, we have to show ||B~/Py/B~,"^||2 < 1, where By/ denotes a generahzed 
Cholesky factor of Cy/. Note that the non-singularity of Cy/ follows from the non-singularity of C^. 
Clearly, Cy/ = HCxH^ + = ^ (5 + tr {^^^CJiT^]) HH^ and Py/ = Pz, so that B'/Py/By,^ 
(n/ [S + tr {H-iCzH-^})) H-^PzH"^ and, furthermore, 

n 



5 + tr{H-iCzH-^} 



iH-^PzH-'^lb 



(23a), (23b) 
< 



n 



5 + tr{H-iCzH--f^} 



(22) 1 
< 

2 



The capacity loss is then given by 



AC7 = C - C 

= /i(y)-/i(z)-/i(y')+/i(z) 
= h{y)-h{y') 

1 ^ 

= logdet(7reCy) - log det(7reCy) - - ^log(l - ^j) 



1=1 



n 



i4\ 



i=l 



where (*) follows from Theorem 4.8, since Py = and Cy' = Cy. For the bound note that 
-- J]log(l - < -^log(l - ||B;/Py/B/||2) < -^log(l - -) = nlog-=. 

i=l ^ 

□ 

Example. Let us consider the special case of the complex scalar channel y = x + z with noise covariance 
G M and complementary noise covariance Pz G K, where Cz> Pz > 0. According to (3), 

Cz + Pz 

Cz-Pz 



C,(0 - 2 



which is illustrated in Fig. 1(a). It is seen that the noise power is different for real and imaginary part. 
If it is erroneously believed that Pz = 0, the same power is assigned to real and imaginary part of the 
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correct 
water filling 



Figure 1: Water filling strategies illustrating the capacity loss. 



input vector, as it is shown in Fig. 1(b). However, the optimum power distribution that maximizes 
the mutual information is different; it is depicted in Fig. 1(c). Note that this capacity-achieving power 
distribution is obtained by water filling on a real and imaginary part level. The difference between the 
mutual informations of solution (b) and (c) is expressed by the capacity loss AC. 

6 Conclusion 

We studied the influence of circularity /non-circularity and propeness/improperness on important infor- 
mation theoretic quantities such as entropy, divergence, and capacity. As a motivating starting point 
served a theorem by Neeser & Massey [13], which states that the entropy of a zero-mean complex- valued 
random vector is upper-bounded by the entropy of a circular /proper Gaussian distributed random vector 
with same covariance matrix. We strengthened this theorem in two different directions: (i) we dropped 
the Gaussian assumption and (ii) we dropped the properness assumption. In both cases the resulting 
upper-bound turned out to be tighter than the one previously known. A key ingredient for the proof in 
case (i) was the introduction of the circular analog of a given complex-valued random vector. Whereas 
its definition was based on intuitive arguments to obtain a circular random vector, which is "close" to 
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the (potentially) non-circular given one, we rigorously proved that it equals the unique circular random 
vector with minimum Kullback-Leibler divergence. On the other hand, for (ii), we exploited results about 
the second-order structure of complex- valued random vectors that were obtained without making use of 
the augmented covariance matrix (in contrast to related work). Additionally, we presented a criterion 
for a matrix to be a valid complementary covariance matrix. Furthermore, we addressed the capacity 
of MIMO channels. Regardless of the specific distribution of the channel parameters (noise vector and 
channel matrix, if modeled as random) , we showed that the capacity-achieving input vector is circular for 
a broad range of MIMO channels (including coherent and noncoherent scenarios) . This extends known 
results that make use of a Gaussian assumption. Finally, we investigated the situation of an improper 
and Gaussian distributed noise vector. We computed both capacity and capacity-achieving input vector 
and showed that improperness increases capacity, provided that the complementary covariance matrix 
is exploited. Otherwise, a capacity loss occurs, for which we derived an explicit expression. 
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